Emergent Structures in Large Networks 

David Aristoff^ Charles Radin^ 

October, 2011 



Abstract. We consider a large class of exponential random graph models and prove the existence 
of a region of parameter space corresponding to multipartite structure, separated by a phase 
transition from a region of disordered graphs. 
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I. Introduction and statement of results 



Complex networks, including the internet, world wide web, social networks, biological net- 
works etc, are often modeled by probabilistic ensembles with one or more adjustable parameters; 
see for instance [N] and [L] , and the many references therein. We will use one of these standard 
families, the exponential random graph models (see references in [CD], [Fl], [F2], [L] and [N]), to 
study how multipartite structure can exist in such networks, stable against random fluctuations, 
in imitation of the modeling of crystalline structure of solids in thermal equilibrium. 

We will be considering a large family of exponential random graph models, but for simplicity 
we first discuss the particular case introduced by Strauss [S] in which a graph Gn with N 
nodes, En{Gn) edges and T]\[{Gn) triangles is probabilistically modeled with the following 
two-parameter probability mass function: 

gaiEjv(Giv)+a2Tjv(Gjv) 
Probai,a2{GN) = 7^—. ■ (1) 

normalization 

The maximum number of edges in Gjv is of order N^, and for triangles order N^; it will 

be useful below if we renormalize quantities. Wc will work with edge and triangle densities, 
eN{GN) = En{Gn)/N'^ and iiv(GAr) = Tn{Gn)/N^, and introduce new parameters /3i,/32 so 
that 

pN^[fiieN{GN)+02tN{GN)] 

Probis,MGN) = p- . (2) 

normahzation 

We think of the parameters Pi , ^2 as representing mechanisms for influencing the network, as 
pressure and temperature do in models of materials in thermal equilibrium. Indeed it is easy to 
sec by differentiation that if /3i is fixed, varying /32 will vary the mean value of the triangle den- 
sity; similarly if (32 is fixed, varying /3i will vary the mean value of the edge density. Furthermore 
if the mean value E^^^^jI^wIGat)] of eAr(Giv) is fixed and « then, as we will see below, 
the random graph will have a very low value for the mean value E^^^^i3^[t]\f{GN)] of t]y{GN)- 
However if E^^^^2[eAr(Giv)] is fixed any variation of /32 > docs not affect Ey3^^^2[tAr(G7v)] (when 
N is large) [RY]. It is natural to treat separately the cases /32 < and /32 > 0. The former is 
called repulsive, the latter attractive; see [RY]. The attractive case (32 > has been completely 
analyzed in [RY], so we concentrate here on the case with repulsion, P2 < 0. 

It is useful to analyze the phenomenon in the last paragraph, as regards (32 « 0, in 
two stages. First, consider the nonprobabilistic optimization problem in which one maximizes 
the edge density among those graphs Gn of N nodes which have no triangles, tAr(Giv) = 0, 
corresponding intuitively to ^2 = —00. This was solved by Turan [T], who showed that the 
optimum is uniquely achieved by the complete bipartite graph with equal size parts. (The parts 
differ by 1 if is odd.) One can understand the Strauss model as a two stage generalization 
of this optimization problem. First one considers the two-parameter set of graphs: 

XN{e,t) = {Gn ■ eAr(Giv) = e, ijv(Giv) = t}. (3) 

Then one studies the interaction of the two conditions, ejv(GAr) = e and tAr(Gjv) = t, through 
the cardinality |AV(e, t)| of X]\f{e,t). Specifically, consider the entropy, defined on probability 
mass functions p on the set <%jv(e,t) by: 
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S^{e,t)[p] = - Yl PjHpj)- (4) 

jeXN(e,t) 

It is easy to prove that 5jv(e,t) is maximized uniquely by the uniform distribution p{e,t), and 
that 

SN{e,t)[p{e,t)]=ln{\XN{e,t)\)- (5) 

If one alters the optimization so that one doesn't restrict p to be supported in X]\f{e,t) but 
instead assumes the mean values of the two densities ejv(GAr) and tAr(GAr) with respect to 
p are fixed, then using Lagrange multipliers N'^f^i for e]\^{Gj\f) and N'^P2 for t]\f{GN) in the 
optimization of the entropy leads to the unique optimizer given in (2), in which the /3's control 
the mean values. It was shown in [CD] that if /32 << then in a technical sense Gn for large A'' 
looks like the complete bipartite graph with equal parts, with some edges randomly removed, 
but in particular tNiGjy) « 0. On the other hand we will see that when —1/3 < (32 < 0, 
the edges in Giy arc roughly independent, so fixing ejv(Gjv) automatically fixes tjv(Giv), not 
leaving any flexibility. 

Although the networks corresponding to ^2 > do not have interesting structure, there is 
still an interesting phenomenon in this regime, associated with sensitivity to variation of the 
parameters. More specifically, it was proven in [CD] that at certain values of /32 > there is a 
special value Pi = Pi(P2) at which small changes of /?i with (32 fixed lead to a jump in the mean 
value of the density eiv(Gjv)- Furthermore, it was shown in [RY] that all singular behavior of 
the distributions is concentrated on a certain curve Pi = q{l32)- (We will clarify the meaning of 
"singular" below.) 

We are interested here in the more complicated case /32 < 0. As noted above, if |/32| < 1/3 
then for large N the graph will have approximately independent edges; in particular we will 
show below that the difference 

(^f,„pA^N{GN)]f - W.p,^pAtN{GN)] (6) 

has limit as A'" — > 00. However one might expect from Turan's theorem that for any fixed f3i 
(or mean value of eAr(Giv)), once P2 is sufficiently negative the graph should look bipartite, and 
so the difference should be roughly {Kp^^p^[eN{GN)\f 7^ 0. We prove this below but furthermore 
prove that the qualitative structural change occurs abruptly: in order to accomplish the change, 
for each /3i the distribution exhibits "singular behavior" at some {32 < 0. Before clarifying the 
meaning of "singular" we generalize the role played by triangles in the Strauss model to the 
following two-parameter exponential random graph model: 

h{GN) = e^^[^i*('^i'<^^)+^2t(H2,GA,)-i/.jv(/3i,/32)]^ (-7-) 

where: Hi is an edge, H2 is any finite simple graph with k > 2 edges, and t{Hi,GN) is the 
density of graph homomorphisms H ^ Gn: 

with V{-) denoting a vertex set. The term '4'n{Pi,P2) gives the probability normalization. 
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Fundamental to our results are questions of analyticity of the normalization in (7) , which 
we discuss next. (See [KP] for elementary properties of real analytic functions of several real 
variables.) An explicit formulation of the normalization is: 



It is proven in [CD] that 



Gn 



V'oo(/3i,/32)= lim V'iv(/3i,/?2) (10) 

N-^oo 

exists for all ^1,^2- It is also noted in [RY] that at points where ipoo is analytic, 

^,^M,M=i'^^^,^.iPuM, (11) 

that is, the partial derivatives commute with the limit N ^ 00. Partial derivatives of -000 5 when 
they exist, give information on the large-A^ mean and variance of the densities t{Hi, Gn) and 
t{H2, Gn) (see [RY]) and it is standard in the corresponding modeling of materials, in part for 
this reason, to define phases and phase transitions as follows (see [FR]). 

Definition. A phase is an open connected region of the parameter space {(/3i,/32)} which is 
maximal for the condition that i^ooiPiih) is analytic. There is a phase transition at {^1,^2) 
if (Z?! , /3|) is a boundary point of an open set on which tp^x, is analytic, but -^oo is not analytic 

at (PIP*). 

So in the above, "singular" meant nonanalytic. In this notation our main result is: 

Theorem. Assume the chromatic number x{H2) of H2 is at least 3. Then there is a curve 
P2 = •s(/3i), — c» < /?! < 00, in the lower half plane (^2 < 0), such that the model exhibits a 
phase transition on the curve. 

II. Proof of the theorem 

Let k be the number of edges in H2- We write P for the probability mass function ]P^i,^2 
given by equation (7), and E for the expectation E^-^^^j- 

By Theorem 6.1 in [CD], the analyticity method of the proof of Theorem 3.10 in [RY] can 
be immediately extended to prove that ipoo{Pi:P2) is analytic in the real variables /3i and P2 
when 1^2! < 2/[k{k — 1)]. Our proof will be by contradiction, so we assume from here on that 
i^ooiPi, ^2) is analytic in /3i and /32 on the entire half line L = {{(31, /32) : /32 < 0}, where is 
arbitrary but fixed. We will find a contradiction, which will prove the existence of the curve 
P2 = s{f3i). 

Consider the function 

C{Pi,P2) := (^(/5i'/52))'-^(A,/32) (12) 
Note that C(/3i,/32) is analytic on L, since il^oo{Pi^ P2) is. 
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Proposition 3.2 in [RY] proves, for all /32 < 0, there is a unique solution u*{l3i,P2) to the 
optimization of 

I3iu + I32u'' - In - ^ (1 - n) ln(l - n) (13) 

for u G [0, 1]. Then from Theorems 6.1 and 4.2 in [CD] we can use the same argument as used 
to prove equations (33) and (34) in [RY] to prove, for -2/[k{k - 1)] < ^2 < 0: 

^V'oo(/3i,/32) = lim E{t{Hi,GN)} = t{H^X) = u*{Pi,P2), (14) 
opi 

^V'oo(/3i,^2) = lim E{t{H2,GN)} = t{H2,u*) = (n*(/3i, ^2))^ (15) 

OP2 N—>-oo 

It follows that C{P^,I32) = {t{Hi,u*))'' -t{H2,u*) = for [^2! < 2/[A;(A; - 1)]. Since a function 
of one variable which is analytic on L and constant on a subinterval must be constant on L, it 
follows that 

CiPl,^2) = 0onL. (16) 

We next obtain a contradiction to (16), but first we need some notation; see [CD], [BCL] 
and [L] for discussions of the ideas behind these terms, which basically provide the framework 
for "infinite volume limits" for graphs, in analogy with the infinite volume limit in statistical 
mechanics [R]. 

To each graph G on N nodes we associate the following function on [0, 1]^: 

f'^{x,y) = 1 if (iNxl, iNyl) is an edge of G, and f'^{x,y) = otherwise. (17) 

We define W to be the space of measurable functions h : [0, 1]^ — >■ [0, 1] which are symmetric: 
h{x,y) = h{y,x), for all x,y. For h €W we define 

t{H,h)= I TT h{xi,Xj) dxi ■ ■ ■ dxj(. (18) 

where E{H) is the edge set of H, and i = \V{H)\ is the number of nodes in H, and note that 
for a graph G, t{H, G) defined in (8) has the same value as t(H, f^). We define an equivalence 
relation on W as follows: / ~ 5 if and only if t{H,f) = t{H,g) for every simple graph H. 
Elements of the quotient space, VV, are called "graphons" , and the class containing h E W is 
denoted h. The space W is compact [L]. 

On VV we define a metric in steps as follows. First, on W we define 

daif,g)^ sup / [fix,y)-g{x,y)]dxdy. (19) 

S,TC[0,1] JSxT 

Let S be the space of measure preserving bijections a of [0, 1], and for / in W and cr G S define 
fa{x,y) = f{a{x),a{y)). Using this we define a metric on W by 

So{f,g)= inf da{fa„g.,). (20) 
Next we need a few terms associated with i/j^. Define on [0, 1]: 
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and on W: 



Also on VV we define: 



I{u) = ^u\n{u) + ^ (1 - u) ln(l - n) (21) 



I{h)= [ I{h{x,y))dxdy. (22) 

J[0,l]2 



T{h) = l3it{Hi, h) + ht{H2, h). (23) 



The above are relevant because it is proven in Theorem 3.1 of [CD] that ipooil^i^h) is the 
solution of an optimization problem: 

V'oo(/3i,/32) = sup [T{h) - I{h)]. (24) 
hew 

Furthermore, from Theorem 3.2 of [CD] one has some control on the asymptotic behavior as 
N ^ oo: 

5u[GN,F*{l3i,p2)\ ^ in probability as iV ^ oo, (25) 

where F*(/3i,/32) is the (compact) subset of VV on which T — / is maximized, and Gn = f'^^- 
We now return to our proof. Fix e > and i G {1,2}. Recall /3i = /3| is fixed arbitrarily. 
Write F*{P2) for the set F*(/3i,^2) C W defined above. Using Theorem 7.1 in [CD], choose p'^ 
sufficiently negative so that for every ^2 < P2 

sup Saif,pg)<^. (26) 
/eF*(^2) '^'^ 

Using Theorem 3.2 in [CD], choose iVo(^2) such that N > No{P2) implies 



e 



3k 



€ 



fa(Gjv,F*(/32)) > — <— . (27) 



3A; 



Let ^e,Ar = {Gn '■ ^a{GN, F*iP2)) < e/(3fe)}- By compactness of F*{j32) we may choose 
^ F*{P2) corresponding to each Gn e ^e,Ar such that 

fa(Giv,^G^) < ^. (28) 

Write Ej^ for the restriction of the expectation to the set A. Using (26) and (28) we have that 

E\^^Jdo{GN,P~9)]= Yl HGN,pg)nGN) 
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(29) 



for N > No{P2)- Now write A^^n for the complement of A^^n- Then by Lemma 3.12 in [RY], 
equation (29), and the fact that (5n(-, •) < 1, 



E[t{Hi,GN)]-t{Hi,pg) 

<E[\t{Hi,GN)-t{Hi,pg)\] 
<k-E\SD{GN,pg) 



/2e e 



for N > No{/32). Using the identity 



{/3l,l32)=E[tiHi,GN)] 



along with (11), we may take the limit A/" — )■ cx) in (30) to obtain 



< €. 



Since e > was arbitrary. 



lim 



132-^ -oc dPi 

Direct computation using equation (2.10) in [CD] yields: 



t{H2,pg) = and t{Hi,pg) 



(l + e2/3i)(x(i/)-l) 



> 0. 



(30) 
(31) 

(32) 

(33) 

(34) 



Now, by combining (12) with (33)-(34) we find lim/)^^^^C{/3*, P2) > 0, in contradiction with 
(16), which proves the theorem. ■ 



III. Conclusion 

Consider any of the two-parameter exponential random graph models with repulsion covered 
by our theorem. Define the 'high energy phase' of the parameter space {(/?i, /?2) | /32 < 0} as 
that domain of analyticity of tpoo{l3i,P2) which contains the strip —2/[k{k — 1)] < /32 < 0. 
The order parameter C(/3i,/32) is identically zero in this phase (as one can see for instance by 
connecting by an analytic curve any given point in the open, connected phase to a point in the 
strip, and complexifying). We have proven that this phase is separated from the low energy 
regime in the sense that for each Pi there is some P2 such that the segment {{Pi , P2) \ P2 < P2} 
does not intersect the phase. Our proof is based on the traditional modeling of equilibrium 
statistical mechanics using analyticity and an order parameter [R] , [K] , [Y] . And we emphasize 
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that this method could not have been used to prove the transition found in [RY] for attractive 
exponential random graph models since there is a critical point for that transition: indeed there 
is only one phase. 

In comparison with traditional models from statistical mechanics, exponential random graph 
models could be thought of as either infinite range, or infinite dimensional, which suggests a 
relation with 'mean field theories' [K], [Y]. Mean field theories grew out of the work of van 
der Waals, who obtained a general description of fluids by adroitly replacing the interaction 
between each molecule and the rest of the fluid by an average or mean field, among other things 
losing track of the spatial separation of the interacting molecules. This proved to be a useful 
approximation to understand gas /liquid phase transitions in which the most relevant part of the 
particle interaction is a (long range) attraction. It is not too surprising that exponential random 
graph models with attractive interaction could therefore yield a phase transition like that of 
the liquid/gas transition, as was shown in [RY]. In the present paper we obtain a transition 
more like a fluid/solid transition, in which there is a change of 'symmetry' from disordered to 
multipartite. It is less intuitive to use a long range repulsion to model a solid/fluid transition, so 
the materials analogy of our models with repulsion is less compelling than for our models with 
attraction. Therefore the relation of these models with repulsion to mean field approximations 
might be particularly illuminating. 

There remain many open questions. Perhaps the most pressing is the character of the 
singularity of V'oo(/3i, /32) at the boundary of the high energy phase. In the attractive case 
there is only one phase but there are jump discontinuities, in the first derivatives of ■0oo(/3i, /32) 
(namely the average edge and energy densities) , across a curve where two regions of the phase 
abut, while the edges are independent throughout the phase [RY]. We do not know the nature 
of the singularity at the boundary of the high energy phase for the case of repulsion studied 
in this paper, though we expect the first derivatives of il^ooif^i-, h) to be discontinuous across 
the boundary. In analogy with equilibrium materials there may be multipartite phases with 
different numbers of parts at low energy, though this may require more complicated interactions 
[CD]. 
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